AppL No. 09/661,633 

Response to Non-Compliant Amdt. dated May 23, 2006 
Non-Compliant Amdt. dated May 1 1, 2006 

Amendments to the Claims : 

This listing of claims will replace all prior versions, and listings, of claims in the application. 
Listing of Claims : 

1 (Previously presented): A method of detecting a facial region within a video 
comprising the steps of: 

(a) receiving a first frame of said video comprising a plurality of pixels; 

(b) receiving a subsequent frame of said video comprising a plurality of pixels; 

(c) calculating a difference image representative of the difference between a plurality 
of said pixels of said first frame and a plurality of said pixels of said subsequent 
frame; 

(d) determining a plurality of candidate facial regions within said difference image 
based on a transform of said difference image in a spatial domain to a parameter 
space; and 

(e) fitting said plurality of candidate facial regions to said difference image, where 
said difference image used for said fitting is free from being transformed as a 
result of step (d), to select one of said candidate facial regions. 

2 (Original): The method of claim 1 further comprising the step of thresholding said 
difference image thereby removing values of said difference image less than a threshold value. 

3 (Original): The method of claim 2 wherein said threshold value is a predetermined 
value and said removing values is setting said values of said difference image that are less than 
said threshold value to a selected value. 

4 (Original): The method of claim 1 wherein said transform is a Hough transform. 



Page 3 of 20 



Appl. No. 09/661,633 

Response to Non-Compliant Amdt. dated May 23, 2006 
Non-Compliant Amdt. dated May 11, 2006 

5 (Currently amended): Th e m e thod of claim 4 wh e r e in said Hough transform is 
Arfrfcry g, r) - A (x^», r) + 1 V Xary fe r c (x x») a + (y y« ^-=A A method of detecting a facial 
region within a video comprising the steps of: 

(a) receiving a first frame of said video comprising a plurality of pixels; 

(b) receiving a subsequent frame of said video comprising a plurality of pixels; 

(c) calculating a difference image representative of the difference between a plurality 
of said pixels of said first frame and a plurality of said pixels of said subsequent 
frame; 

(d) determining a plurality of candidate facial regions within said difference image 
based on a Hough transform of said difference image in a spatial domain to a 
parameter space of the form 

A (Xr. Vr. rt = A (x^ Vr, r) + 1 V x^ y £ , r s (x-Xc) 2 + (y-y^) 2 = r 2 ; and 

(e) fitting said plurality of candidate facial regions to said difference image, where 
said difference image used for said fitting is free from being transformed as a 
result of step (d), to select one of said candidate facial regions. 

6 (Original): The method of claim 1 where said fitting of each of said candidate facial 
regions is based on a combination of at least three factors including, a fit factor representative of 
a fit of said candidate facial regions to said difference image, a location factor representative of 
the location of said candidate facial regions within said video, and a size factor representative of 
the size of said candidate facial regions. 

7 (Original): The method of claim 1 further comprising the step of scaling said first 
frame and said subsequent frame of said video to reduce the number of said pixels of said first 
and subsequent frame prior to said calculating said difference frame. 
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8 (Currently amended): Th e m e thod of claim 1 wh e r e in said st e p of dotormining said 
plurality of Gaid candidat e facial r e gions and fitting said plurality of Gaid candidat e facial r e gions 
furth e r compris e s th e st e ps of: A method of detecting a facial region within a video comprising 
the steps of: 

(a) receiving a first frame of said video comprising a plurality of pixels; 

(b) receiving a subsequent frame of said video comprising a plurality of pixels; 

(c) calculating a difference image representative of the difference between a plurality 
of said pixels of said first frame and a plurality of said pixels of said subsequent 
frame; 

(d) determining a plurality of candidate facial regions within said difference image 
based on a transform of said difference image in a spatial domain to a parameter 
space wherein said step of determining said plurality of said candidate facial 
regions and fitting said plurality of said candidate facial regions further comprises 
the steps of: 

(a)£i) determining a set of candidate circles based on a Hough transform of said 
difference image; 

(Wii) scoring said set of said candidate circles based on a combination of at least 
three factors including, a fit factor representative of the fit of said 
candidate circles to said difference image, a location factor representative 
of the location of said candidate circles within said video, and a size factor 
representative of the size of said candidate circles; 

fe Kiii) selecting at least one of said candidate circles based on said scoring; 

(d¥iv) generating at least one candidate facial region having an elliptical shape 
for each of said at least one of said candidate circles; and 

fe¥v) scoring each of said candidate facial regions based on a combination of at 
least three factors including, a fit factor representative of the fit of a 
respective said candidate facial region to said difference image, a location 
factor representative of the location of said respective said candidate facial 
region within said video, and a size factor representative of the size of said 
respective said candidate facial region; and 



Page 5 of 20 



Appl No. 09/661,633 

Response to Non-Compliant Amdt. dated May 23, 2006 
Non-Compliant Amdt. dated May 11, 2006 

(e) fitting said plurality of candidate facial regions to said difference image, where 
said difference image used for said fitting is free from being transformed as a 
result of step (d), to select one of said candidate facial regions. 

9 (Original): The method of claim 8 wherein said generating at least one candidate facial 
region has a center of said elliptical shape located within a bounded region of potential locations 
having a greater vertical dimension than a horizontal dimension centered about the center of said 
respective said candidate circle. 

10 (Previously presented): A method of detecting a facial region within a video 
comprising the steps of: 

(a) receiving a first frame of said video comprising a plurality of pixels; 

(b) receiving a subsequent frame of said video comprising a plurality of pixels; 

(c) calculating a difference frame representative of the difference between a plurality 
of said pixels of said first frame and a plurality of said pixels of said subsequent 
frame; 

(d) determining a plurality of candidate facial regions within said difference frame; 
and 

(e) fitting said candidate facial regions to said difference image to select one of said 
candidate facial regions based on a combination of at least two of the following 
three factors including, a fit factor representative of the fit of said candidate facial 
regions to said difference image, a location factor representative of the location of 
said candidate facial regions within said difference image, and a size factor 
representative of the size of said candidate facial regions. 

1 1 (Original): The method of claim 10 where said determining said candidate facial 
regions is based on a Hough transform of said difference image in a spacial domain to a 
parameter space. 
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12 (Currently amended): Th e m e thod of claim 1 1 wh e r e in said Hough transform is 
Arfoer-y e, r) - A (x^ s-y e, r) + 1 V Xe ry e, r c (x x<^+ (y y^ -^**? A method of detecting a facial 
region within a video comprising the steps of: 

(a) receiving a first frame of said video comprising a plurality of pixels; 

(b) receiving a subsequent frame of said video comprising a plurality of pixels: 

(c) calculating a difference frame representative of the difference between a plurality 
of said pixels of said first frame and a plurality of said pixels of said subsequent 
frame; 

(d) determining a plurality of candidate facial regions within said difference frame 
based on a Hough transform of said difference image in a spatial domain to a 
parameter space wherein said Hough transform is 

A (x^ Ye, r) = A (x £ , y £ , r) + 1 V x £ , Vr, r s (x-Xc) 2 + (y-y £ ) 2 = r 2 ; and 

(e) fitting said candidate facial regions to said difference image to select one of said 
candidate facial regions based on a combination of at least two of the following 
three factors including, a fit factor representative of the fit of said candidate facial 
regions to said difference image, a location factor representative of the location of 
said candidate facial regions within said difference image, and a size factor 
representative of the size of said candidate facial regions. 

13 (Original): The method of claim 10 further comprising the step of thresholding said 
difference image thereby removing values of said difference image less than a threshold value. 

14 (Original): The method of claim 10 further comprising the step of scaling said first 
frame and said subsequent frame of said video to reduce the number of said pixels of said first 
and subsequent frame prior to said calculating said difference frame. 
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15 (Previously presented): A method of determining sensitivity information for a video 
comprising the steps of: 

(a) receiving a first frame of said video; 

(b) receiving a subsequent frame of said video; 

(c) determining a spatial location of a facial region within said video based on at least 
said first and said subsequent frame; and 

(d) calculating a sensitivity value for each of a plurality of spatial locations within 
said video based upon both said spatial location of said facial region within said 
video in relation to said spatial plurality of locations and a non-linear model of the 
sensitivity of a human visual system's ability to perceive image detail at eccentric 
visual angles. 

16 (Original): The method of claim 15 wherein the step of said calculating said 
sensitivity values is further based upon calculating an eccentricity versus image location in 
relation to a viewer of said video for said plurality of locations within said video. 

17 (Original): The method of claim 16 wherein said calculating said sensitivity is further 
based upon a sensitivity versus eccentricity non-linear model of said human visual system. 

18 (Currently amended): The method of claim 16 wherein said eccentricity is derived 
according to the following: 
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where 9e is said eccentricity, y is a vertical pixel position within said video, x is a horizontal 
position within said video, x c represents a horizontal component of a center position of an 
elliptical said facial region, y c represents a vertical component of said center position of said 
elliptical said facial region, x r represents a first elliptical radii of said elliptical said facial feature 
in a horizontal direction; y r represents a second elliptical radii of said elliptical said facial feature 
in a vertical direction, and V represents a viewing distance. 

19 (Original): The method of claim 15 wherein said sensitivity values are based upon the 
distance from the outer edge of said facial region to said plurality of locations within said video. 

20 (Original): The method of claim 17 wherein said sensitivity versus eccentricity non- 
linear model is derived according to the following, 

5- ' 

1 + ^EEC^E 

where S is representative of said sensitivity, kecc is a constant, and 0e is representative of a non- 
linear contrast sensitivity function. 

21 (Previously presented): A method of encoding a video comprising the steps of: 

(a) receiving a frame of said video consisting of a plurality of pixels; 

(b) calculating sensitivity information for a plurality of locations within said frame of 
said video calculated based upon the sensitivity of a human visual system of a 
viewer perceiving image detail at eccentric visual angles of a particular region of 
said frame of said video, where said particular region of said frame is determined 
based upon the content of the frame itself; and 

(c) encoding said frame in a manner that provides a substantially uniform apparent 
quality to perceiving detail at eccentric visual angles of said plurality of locations 
of said frame to said viewer when said viewer is observing said particular region 
of said frame of said video. 
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22 (Previously presented): The method of claim 21 wherein said encoding of each of 
said plurality of locations of said frame of said video is based on a respective quantization value 
representative of a base quantization factor divided by said sensitivity information for a 
respective one of said plurality of locations in a manner that said encoding employs at least two 
different quantization values, where said plurality of locations within said video are determined 
based upon the content of the frame itself. 

23 (Previously presented): The method of claim 22 wherein said encoding is derived in 
accordance with the following: 

Q/Si,Q[ 5 ]/S 2j Q/S 35 .. 5 Q/Sn 
where Q is representative of said base quantization factor, and Si through Sn are representative 
of said sensitivity information for said plurality of locations. 

24 (Currently amended): Th e m e thod of claim 23 wh e r e in on e of said S^, wh e r e k is a 
valu e from 1 to N, is d e riv e d bas e d upon a statistical calculation of a plurality of said s e nsitivity 
information for on e of said locations of said imag e . A method of encoding a video comprising 
the steps of: 

(a) receiving a frame of said video consisting of a plurality of pixels; 

(b) calculating sensitivity information for a plurality of locations within said frame of 
said video calculated based upon the sensitivity of a human visual system of a 
viewer perceiving image detail at eccentric visual angles of a particular region of 
said frame of said video, where said particular region of said frame is determined 
based upon the content of the frame itself; and 

(c) encoding said frame in a manner that provides a substantially uniform apparent 
quality to perceiving detail at eccentric visual angles of said plurality of locations 
of said frame to said viewer when said viewer is observing said particular region 
of said frame of said video, wherein said encoding of each of said plurality of 
locations of said frame of said video is based on a respective quantization value 
representative of a base quantization factor divided by said sensitivity information 
for a respective one of said plurality of locations in a manner that said encoding 
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employs at least two different quantization values, where said plurality of 
locations within said video are determined based upon the content of the frame 
itself, said encoding being derived in accordance with the following: 
Q/S i . OM/S z . 0/S^..,Q/Sn 



where O is representative of said base quantization factor, and through Sn are 
representative of said sensitivity information for said plurality of locations, 
wherein one of said Sk where k is a value from 1 to N, is derived based upon a 
statistical calculation of a plurality of said sensitivity information for one of said 
locations of said image. 



25 (Original): The method of claim 24 wherein Sk is an average of said plurality of said 
sensitivity information. 

26 (Original): The method of claim 21 wherein said encoding of said frame of said 
video includes at least two different quantization values. 

27 (Original): The method of claim 21 wherein said encoding said frame results in the 
total number of bits produced for said frame being substantially equal to a preselected number. 

28 (Original): The method of claim 27 wherein said frame is encoded only once. 

29 (Original): The method of claim 27 wherein said encoding of each of said plurality of 
locations is based on a respective quantization value representative of a base quantization factor 
divided by said sensitivity information for a respective one of said plurality of locations. 

30 (Previously presented): The method of claim 29 wherein said base quantization factor 
is derived in accordance with the following: 
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where A is representative of the number of pixels in one of said plurality of locations, K and C 
are constants associated with said plurality of locations, N is representative of the number of said 
plurality of locations, B is representative of said total number of bits, the of values are a measure 
how much texture is associated with said plurality of locations, and the Si values are 
representative of the respective said sensitivity information squared. 

3 1 (Previously presented): A method for encoding multiple blocks in a frame of image 
data, comprising: 

(a) identifying a target bit value equal to a total number of bits available for encoding 
the frame; 

(b) calculating sensitivity information for each one of the blocks based upon the 
sensitivity of a human visual system perceiving image detail at eccentric visual 
angles of a particular region of the image, where said eccentricity of said 
particular region of said image is determined based upon the content of the frame 
itself; 

(c) adapting quantization values for each of the multiple blocks to provide 
substantially uniform apparent quality to perceiving detail at eccentric visual 
angles of each of the blocks in the frame subject to a constraint that the total 
number of bits available for encoding the frame is equal to the target bit value; 
and 

(d) encoding the blocks with the quantization values. 

32 (Previously presented): The method of claim 31 wherein the quantization values are 
derived according to the following, 



where, Qi is the quantization value for each block i, N is the number of blocks in the frame, B is 
the total number of bits available for encoding the frame, A is a number of pixels in each of the 
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multiple blocks, K and C are constants associated with the image blocks, g\ is an empirical 
standard deviation of pixel values in the block, and S\ is a weighting incorporating the sensitivity 
information for the block. 

33 (Original): The method of claim 31 including adjusting the quantization values 
according to a number of image blocks remaining to be encoded, a number of bits still available 
for encoding the remaining image blocks, and a value that depends on the sensitivity and texture 
of the remaining image blocks. 

34 (Original): The method of claim 32 including using a K parameter and a C parameter 
on a block-by-block basis to adjust the quantization values for each of the multiple blocks, the K 
parameter modeling correlation statistics of the pixels in the image blocks and the C parameter 
modeling bits required to code overhead data. 

35 (Original): The method of claim 34 including deriving the optimum quantization 
values in either a fixed mode where the K and C parameters are known in advance or an adaptive 
mode where the K and C parameters are derived according to the K and C parameters of 
previously encoded blocks. 

36 (Original): The method of claim 35 wherein the adaptive mode includes the following 

steps: 

(a) deriving values for the K and C parameters that exactly predict the number of bits 
B used for encoding previous blocks; 

(b) deriving averages for the derived K and C parameters for the previously encoded 
video blocks; and 

(c) predicting the K and C parameters for a next video block by weighting the 
average K and C parameters according to the initial estimates for the K and C 
parameters. 
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37 (Currently amended). A method for encoding video comprising the steps of: 

(a) detecting the location of a facial region of a frame of said video; 

(b) calculating a sensitivity value for each of a plurality of locations within said frame 
of said video based upon said location of said facial region wherein said 
sensitivity values are calculated based upon a non-temporal said location of said 
facial region, a non-temporal size of said facial region, and a non-linear model of 
the human visual system's ability to perceive image detail at eccentric visual 
angles ; and 

(c) encoding said frame in manner that provides a substantially uniform apparent 
quality to perceiving detail at eccentric visual angles of said plurality of locations 
to said viewer when said viewer is observing said facial region of said video. 

38 (Canceled). 

39 (Original): The method of claim 37 wherein said detecting said location of said facial 
region of said frame further comprises the steps of: 

(a) receiving a first frame of said video comprising a plurality of pixels; 

(b) receiving a subsequent frame of said video comprising a plurality of pixels; 

(c) calculating a difference image representative of the difference between a plurality 
of said pixels of said first frame and a plurality of said pixels of said subsequent 
frame; 

(d) determining a plurality of candidate regions within said difference image; and 

(e) fitting said plurality of candidate regions to said difference image to select said 
facial region. 

40 (Original): The method of claim 39 wherein said determining said plurality of 
candidate regions is based on a Hough transform of said difference image in a spacial domain to 
a parameter space. 
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41 (Original): The method of claim 39 further comprising the step of thresholding said 
difference image thereby removing values of said difference image less than a threshold value. 

42 (Original): The method of claim 41 wherein said threshold value is a predetermined 
value and said removing values is setting said values less than said threshold value to a selected 
value. 

43 (Currently amended): Th e m e thod of claim 4 0 wh e r e in said Hough transform is 
Arfar¥ e, r) ~ A (x^y^ r) + 1 V x^ ry ^ r c (x x*) z + (y y^-=H?r A method for encoding video 
comprising the steps of: 

(a) detecting the location of a facial region of a frame of said video comprising the 
steps of: 

(1) receiving a first frame of said video comprising a plurality of pixels; 

(2) receiving a subsequent frame of said video comprising a plurality of 
pixels; 

(3) calculating a difference image representative of the difference between a 
plurality of said pixels of said first frame and a plurality of said pixels of 
said subsequent frame; 

(4) determining a plurality of candidate regions within said difference image 
wherein said determining said plurality of candidate regions is based on a 
Hough transform of said difference image in a spacial domain to a 
parameter space wherein said Hough transform is 

A (x c , Vr, r) = A (x^ Yr, r) + 1 V x £ , Y r. r e (x-x<f + (v-Yr) 2 = r 2 ; and 

(5) fitting said plurality of candidate regions to said difference image to select 
said facial region; 

(b) calculating a sensitivity value for each of a plurality of locations within said frame 
of said video based upon said location of said facial region; and 
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(c) encoding said frame in manner that provides a substantially uniform apparent 

quality to perceiving detail at eccentric visual angles of said plurality of locations 
to said viewer when said viewer is observing said facial region of said video. 

44 (Original): The method of claim 39 where said fitting of each of said candidate 
regions is based on a combination of at least two of the following three factors including, a fit 
factor representative of a fit of said candidate regions to said difference image, a location factor 
representative of the location of said candidate regions within said video, and a size factor 
representative of the size of said candidate regions. 

45 (Original): The method of claim 39 further comprising the step of scaling said first 
frame and said subsequent frame of said video to reduce the number of said pixels of said first 
and subsequent frame prior to said calculating said difference frame. 

46 (Original): The method of claim 39 wherein said step of determining said plurality of 
said candidate regions and fitting said plurality of said candidate regions further comprises the 
steps of: 

(a) determining a set of candidate circles based on a Hough transform of said 
difference image; 

(b) scoring said set of said candidate circles based on a combination of at least three 
factors including, a fit factor representative of the fit of said candidate circles to 
said difference image, a location factor representative of the location of said 
candidate circles within said video, and a size factor representative of the size of 
said candidate circles; 

(c) selecting at least one of said candidate circles based on said scoring; 

(d) generating at least one candidate region having an elliptical shape for each of said 
at least one of said candidate circles; and 
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(e) scoring each of said candidate regions based on a combination of at least three 

factors including, a fit factor representative of the fit of a respective said candidate 
region to said difference image, a location factor representative of the location of 
said respective said candidate region within said video, and a size factor 
representative of the size of said respective said candidate region. 

47 (Original): The method of claim 46 wherein said generating at least one candidate 
region has a center of said elliptical shape located within a bounded region of potential locations 
having a greater vertical dimension than a horizontal dimension centered about the center of said 
respective said candidate circle. 

48 (Currently amended): The method of claim 3& 37 wherein said wherein the step of 
said calculating said sensitivity values is further based upon calculating an eccentricity versus 
image location in relation to a viewer of said video for said plurality of locations within said 
video. 

49 (Currently amended): The method of claim 48 wherein said eccentricity is derived 
according to the following, 



where 9e is said eccentricity, y is a vertical pixel position within said video, x is a horizontal 
position within said video, x c represents a horizontal component of a center position of an 
elliptical said facial region, y c represents a vertical component of said center position of said 
elliptical said facial region, x r represents a first elliptical radii of said elliptical said facial feature 
in a horizontal direction; y r represents a second elliptical radii of said elliptical said facial feature 
in a vertical direction, and V represents a viewing distance. 
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50 (Currently amended): The method of claim 3& 37 wherein said sensitivity values are 
based upon the distance from the outer edge of said facial region to said plurality of locations 
within said video. 

5 1 (Currently amended): The method of claim 3& 37 wherein said sensitivity versus 
eccentricity non-linear model is derived according to the following, 

S- 1 

where S is representative of said sensitivity, k E cc is a constant, and 9 E is representative of a 
non-linear contrast sensitivity function. 

52 (Canceled). 

53 (Original): The method of claim 37 wherein said encoding of each of said plurality of 
locations is based on a respective quantization value representative of a base quantization factor 
divided by said sensitivity information for a respective one of said plurality of locations. 

54 (Previously presented): The method of claim 53 wherein said encoding is derived in 
accordance with the following: 

Q/Si,Q[J/S 2 ,Q/S 3 ,..,Q/Sn 
where Q is representative of said base quantization factor, and Si through Sn are representative 
of said sensitivity information for said plurality of locations. 

55 (Original): The method of claim 54 wherein one of said Sk, where k is a value from 1 
to N, is derived based upon a statistical calculation of a plurality of said sensitivity information 
for one of said locations of said image. 
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56 (Original): The method of claim 55 wherein S k is an average of said plurality of said 
sensitivity information. 

57 (Original): The method of claim 55 wherein Sk is a maximum of said plurality of said 
sensitivity information. 

58 (Original): The method of claim 52 wherein said encoding said frame results in the 
total number of bits produced for said frame being substantially equal to a preselected number. 

59 (Original): The method of claim 58 wherein said frame is encoded only once. 

60 (Original): The method of claim 58 wherein said encoding of each of said plurality of 
locations is based on a respective quantization value representative of a base quantization factor 
divided by said sensitivity information for a respective one of said plurality of locations. 

61 (Previously presented): The method of claim 60 wherein said base quantization factor 
is derived in accordance with the following: 



where A is representative of the number of pixels in one of said plurality of locations, K and C 
are constants associated with said plurality of locations, N is representative of the number of said 
plurality of locations, B is representative of said total number of bits, the o\ values are a measure 
how much texture is associated with said plurality of locations, and the Si 2 values are 
representative of the respective said sensitivity information squared. 
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